Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential performance improvements #16

Merged
merged 18 commits into from
Jul 8, 2024
Merged

Potential performance improvements #16

merged 18 commits into from
Jul 8, 2024

Conversation

rubdos
Copy link
Member

@rubdos rubdos commented Jul 7, 2024

I'm looking to find some improvements, currently no measurable effects. At least we're less frequently allocating!

@rubdos
Copy link
Member Author

rubdos commented Jul 7, 2024

Things I could do by eye, up until now:

Benchmarking decode LGF5]+Yk^6#M@-5c,1J5@[or[Q6./200: Collecting 100 samples in estimated 5.decode LGF5]+Yk^6#M@-5c,1J5@[or[Q6./200
                        time:   [2.7536 ms 2.7607 ms 2.7682 ms]
                        change: [-6.8198% -6.4337% -6.0751%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild

Benchmarking decode_into LGF5]+Yk^6#M@-5c,1J5@[or[Q6./200: Collecting 100 samples in estimatdecode_into LGF5]+Yk^6#M@-5c,1J5@[or[Q6./200
                        time:   [2.7061 ms 2.7117 ms 2.7181 ms]
                        change: [-9.6423% -9.3190% -9.0130%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 19 outliers among 100 measurements (19.00%)
  3 (3.00%) low mild
  2 (2.00%) high mild
  14 (14.00%) high severe

     Running benches/encode.rs (target/release/deps/encode-a9d404ceb3500bd9)
Benchmarking encode data/octocat.png: Collecting 100 samples in estimated 5.1413 s (1800 iteencode data/octocat.png time:   [2.8060 ms 2.8099 ms 2.8146 ms]
                        change: [+0.4714% +0.7873% +1.0953%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 18 outliers among 100 measurements (18.00%)
  10 (10.00%) high mild
  8 (8.00%) high severe

Any further improvements will require some more fine-grained benchmarking.

@rubdos
Copy link
Member Author

rubdos commented Jul 8, 2024

Benchmarking decode_into LGF5]+Yk^6#M@-5c,1J5@[or[Q6./200: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.3s, enable flat sampling, or reduce sample count to 60.
decode_into LGF5]+Yk^6#M@-5c,1J5@[or[Q6./200
                        time:   [1.2540 ms 1.2562 ms 1.2585 ms]
                        change: [-59.901% -59.516% -58.875%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
  1 (1.00%) low mild
  8 (8.00%) high mild
  2 (2.00%) high severe

     Running benches/encode.rs (target/release/deps/encode-104439108152a4b3)
encode data/octocat.png time:   [2.6665 ms 2.6696 ms 2.6731 ms]
                        change: [-11.459% -11.268% -11.058%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) high mild
  5 (5.00%) high severe

Focussed mainly on decoding, since we use that way more often than encoding anyway.

Copy link

codecov bot commented Jul 8, 2024

Codecov Report

Attention: Patch coverage is 96.29630% with 2 lines in your changes missing coverage. Please review.

Project coverage is 87.97%. Comparing base (53c0bbe) to head (f49f21d).

Files Patch % Lines
src/base83.rs 93.33% 1 Missing ⚠️
src/lib.rs 96.87% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #16      +/-   ##
==========================================
+ Coverage   87.00%   87.97%   +0.97%     
==========================================
  Files           6        6              
  Lines         300      316      +16     
==========================================
+ Hits          261      278      +17     
+ Misses         39       38       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@rubdos
Copy link
Member Author

rubdos commented Jul 8, 2024

encode data/octocat.png time:   [647.05 µs 648.14 µs 649.39 µs]
                        change: [-74.104% -73.880% -73.483%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 16 outliers among 100 measurements (16.00%)
  5 (5.00%) high mild
  11 (11.00%) high severe

Oops.

@rubdos
Copy link
Member Author

rubdos commented Jul 8, 2024

If we want more speed, we'll have to change the algorithm to use some fast Fourrier-equivalent of the DCT. Not currently in the mood for that. But I think with this PR, we probably have the fastest blurhash around :'-)

We should probably make this WASM-compatible and make a demo, like https://github.com/fpapado/blurhash-rust-wasm did before. Cc @fpapado

TL;DR: ~60% faster in decoding (8ms for 512x512, 80µs for 50x50), ~77% faster in encoding (688µs for octocat.png) on my 7840U.

Benchmark result dump
decode LEHLk~WB2yk8pyo0adR*.7kCMdnj/50
                        time:   [76.581 µs 77.061 µs 77.600 µs]
                        change: [-59.508% -58.760% -57.587%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  4 (4.00%) high mild
  4 (4.00%) high severe

decode_into LEHLk~WB2yk8pyo0adR*.7kCMdnj/50
                        time:   [76.220 µs 76.484 µs 76.767 µs]
                        change: [-59.812% -59.648% -59.485%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

decode LEHLk~WB2yk8pyo0adR*.7kCMdnj/100
                        time:   [311.30 µs 312.27 µs 313.32 µs]
                        change: [-61.161% -60.924% -60.675%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe

decode_into LEHLk~WB2yk8pyo0adR*.7kCMdnj/100
                        time:   [308.58 µs 310.34 µs 312.30 µs]
                        change: [-61.024% -60.820% -60.608%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 23 outliers among 100 measurements (23.00%)
  11 (11.00%) low severe
  3 (3.00%) low mild
  4 (4.00%) high mild
  5 (5.00%) high severe

Benchmarking decode LEHLk~WB2yk8pyo0adR*.7kCMdnj/200: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.3s, enable flat sampling, or reduce sample count to 60.
decode LEHLk~WB2yk8pyo0adR*.7kCMdnj/200
                        time:   [1.2214 ms 1.2257 ms 1.2309 ms]
                        change: [-61.385% -61.246% -61.108%] (p = 0.00 < 0.05)
                        Performance has improved.

Benchmarking decode_into LEHLk~WB2yk8pyo0adR*.7kCMdnj/200: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.4s, enable flat sampling, or reduce sample count to 60.
decode_into LEHLk~WB2yk8pyo0adR*.7kCMdnj/200
                        time:   [1.2470 ms 1.2542 ms 1.2617 ms]
                        change: [-59.459% -59.035% -58.366%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe

decode LEHLk~WB2yk8pyo0adR*.7kCMdnj/256
                        time:   [2.0413 ms 2.0476 ms 2.0543 ms]
                        change: [-60.866% -60.714% -60.558%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

decode_into LEHLk~WB2yk8pyo0adR*.7kCMdnj/256
                        time:   [1.9808 ms 1.9856 ms 1.9908 ms]
                        change: [-60.847% -60.733% -60.626%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 22 outliers among 100 measurements (22.00%)
  2 (2.00%) low mild
  1 (1.00%) high mild
  19 (19.00%) high severe

decode LEHLk~WB2yk8pyo0adR*.7kCMdnj/500
                        time:   [7.7440 ms 7.7812 ms 7.8194 ms]
                        change: [-61.459% -61.235% -61.001%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

decode_into LEHLk~WB2yk8pyo0adR*.7kCMdnj/500
                        time:   [7.8458 ms 7.8854 ms 7.9258 ms]
                        change: [-60.554% -60.240% -59.941%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild

decode LEHLk~WB2yk8pyo0adR*.7kCMdnj/512
                        time:   [8.0230 ms 8.0522 ms 8.0823 ms]
                        change: [-60.630% -60.468% -60.301%] (p = 0.00 < 0.05)
                        Performance has improved.

decode_into LEHLk~WB2yk8pyo0adR*.7kCMdnj/512
                        time:   [7.9280 ms 7.9577 ms 7.9909 ms]
                        change: [-61.498% -61.274% -61.030%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

decode LGF5]+Yk^6#M@-5c,1J5@[or[Q6./50
                        time:   [78.969 µs 79.119 µs 79.280 µs]
                        change: [-59.349% -58.869% -58.112%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
  1 (1.00%) low mild
  4 (4.00%) high mild
  6 (6.00%) high severe

decode_into LGF5]+Yk^6#M@-5c,1J5@[or[Q6./50
                        time:   [77.031 µs 77.480 µs 77.977 µs]
                        change: [-59.693% -59.190% -58.393%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) high mild
  2 (2.00%) high severe

decode LGF5]+Yk^6#M@-5c,1J5@[or[Q6./100
                        time:   [309.55 µs 310.95 µs 312.54 µs]
                        change: [-59.833% -59.728% -59.614%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 17 outliers among 100 measurements (17.00%)
  4 (4.00%) low severe
  2 (2.00%) low mild
  2 (2.00%) high mild
  9 (9.00%) high severe

decode_into LGF5]+Yk^6#M@-5c,1J5@[or[Q6./100
                        time:   [303.18 µs 303.78 µs 304.42 µs]
                        change: [-61.100% -60.958% -60.826%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 18 outliers among 100 measurements (18.00%)
  8 (8.00%) high mild
  10 (10.00%) high severe

Benchmarking decode LGF5]+Yk^6#M@-5c,1J5@[or[Q6./200: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.2s, enable flat sampling, or reduce sample count to 60.
decode LGF5]+Yk^6#M@-5c,1J5@[or[Q6./200
                        time:   [1.2103 ms 1.2123 ms 1.2147 ms]
                        change: [-61.186% -61.009% -60.820%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  2 (2.00%) high mild
  4 (4.00%) high severe

Benchmarking decode_into LGF5]+Yk^6#M@-5c,1J5@[or[Q6./200: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.2s, enable flat sampling, or reduce sample count to 60.
decode_into LGF5]+Yk^6#M@-5c,1J5@[or[Q6./200
                        time:   [1.2016 ms 1.2029 ms 1.2044 ms]
                        change: [-60.875% -60.521% -59.891%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) low mild
  3 (3.00%) high severe

decode LGF5]+Yk^6#M@-5c,1J5@[or[Q6./256
                        time:   [1.9795 ms 1.9854 ms 1.9919 ms]
                        change: [-60.766% -60.640% -60.505%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
  9 (9.00%) high mild
  5 (5.00%) high severe

decode_into LGF5]+Yk^6#M@-5c,1J5@[or[Q6./256
                        time:   [2.0311 ms 2.0359 ms 2.0411 ms]
                        change: [-59.577% -59.467% -59.359%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 19 outliers among 100 measurements (19.00%)
  18 (18.00%) high mild
  1 (1.00%) high severe

decode LGF5]+Yk^6#M@-5c,1J5@[or[Q6./500
                        time:   [7.4842 ms 7.4936 ms 7.5048 ms]
                        change: [-61.239% -61.132% -61.028%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  1 (1.00%) low mild
  4 (4.00%) high mild
  5 (5.00%) high severe

decode_into LGF5]+Yk^6#M@-5c,1J5@[or[Q6./500
                        time:   [7.7484 ms 7.7664 ms 7.7861 ms]
                        change: [-59.656% -59.513% -59.368%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 17 outliers among 100 measurements (17.00%)
  1 (1.00%) low mild
  7 (7.00%) high mild
  9 (9.00%) high severe

decode LGF5]+Yk^6#M@-5c,1J5@[or[Q6./512
                        time:   [8.1303 ms 8.1514 ms 8.1738 ms]
                        change: [-60.495% -60.316% -60.125%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  6 (6.00%) high mild

decode_into LGF5]+Yk^6#M@-5c,1J5@[or[Q6./512
                        time:   [7.8848 ms 7.9084 ms 7.9352 ms]
                        change: [-61.865% -61.721% -61.575%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
  3 (3.00%) high mild
  11 (11.00%) high severe

decode L6Pj0^jE.AyE_3t7t7R**0o#DgR4/50
                        time:   [78.130 µs 78.562 µs 79.062 µs]
                        change: [-60.030% -59.837% -59.660%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 25 outliers among 100 measurements (25.00%)
  4 (4.00%) low severe
  14 (14.00%) low mild
  5 (5.00%) high mild
  2 (2.00%) high severe

decode_into L6Pj0^jE.AyE_3t7t7R**0o#DgR4/50
                        time:   [76.434 µs 76.588 µs 76.786 µs]
                        change: [-61.827% -61.258% -60.850%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  2 (2.00%) low mild
  4 (4.00%) high mild
  2 (2.00%) high severe

decode L6Pj0^jE.AyE_3t7t7R**0o#DgR4/100
                        time:   [310.06 µs 310.75 µs 311.60 µs]
                        change: [-62.352% -61.424% -60.901%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) low mild
  3 (3.00%) high mild
  3 (3.00%) high severe

decode_into L6Pj0^jE.AyE_3t7t7R**0o#DgR4/100
                        time:   [309.55 µs 311.01 µs 312.56 µs]
                        change: [-61.359% -60.753% -60.079%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  6 (6.00%) low mild
  2 (2.00%) high mild
  2 (2.00%) high severe

Benchmarking decode L6Pj0^jE.AyE_3t7t7R**0o#DgR4/200: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.1s, enable flat sampling, or reduce sample count to 60.
decode L6Pj0^jE.AyE_3t7t7R**0o#DgR4/200
                        time:   [1.2074 ms 1.2115 ms 1.2157 ms]
                        change: [-60.847% -60.468% -59.621%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  4 (4.00%) high mild
  6 (6.00%) high severe

Benchmarking decode_into L6Pj0^jE.AyE_3t7t7R**0o#DgR4/200: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.3s, enable flat sampling, or reduce sample count to 60.
decode_into L6Pj0^jE.AyE_3t7t7R**0o#DgR4/200
                        time:   [1.2369 ms 1.2449 ms 1.2525 ms]
                        change: [-61.300% -60.830% -60.125%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe

decode L6Pj0^jE.AyE_3t7t7R**0o#DgR4/256
                        time:   [2.0499 ms 2.0565 ms 2.0635 ms]
                        change: [-59.367% -59.214% -59.049%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  5 (5.00%) high mild

decode_into L6Pj0^jE.AyE_3t7t7R**0o#DgR4/256
                        time:   [2.0361 ms 2.0413 ms 2.0468 ms]
                        change: [-60.688% -60.562% -60.429%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 23 outliers among 100 measurements (23.00%)
  1 (1.00%) low mild
  15 (15.00%) high mild
  7 (7.00%) high severe

decode L6Pj0^jE.AyE_3t7t7R**0o#DgR4/500
                        time:   [7.7494 ms 7.7675 ms 7.7866 ms]
                        change: [-60.686% -60.568% -60.449%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
  14 (14.00%) high mild

decode_into L6Pj0^jE.AyE_3t7t7R**0o#DgR4/500
                        time:   [7.6974 ms 7.7092 ms 7.7231 ms]
                        change: [-61.733% -61.535% -61.357%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) high mild
  5 (5.00%) high severe

decode L6Pj0^jE.AyE_3t7t7R**0o#DgR4/512
                        time:   [8.0575 ms 8.1053 ms 8.1577 ms]
                        change: [-60.247% -59.996% -59.737%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild

decode_into L6Pj0^jE.AyE_3t7t7R**0o#DgR4/512
                        time:   [7.8932 ms 7.9117 ms 7.9319 ms]
                        change: [-61.642% -61.465% -61.283%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 24 outliers among 100 measurements (24.00%)
  3 (3.00%) low mild
  5 (5.00%) high mild
  16 (16.00%) high severe

decode LKO2:N%2Tw=w]~RBVZRi};RPxuwH/50
                        time:   [79.184 µs 79.479 µs 79.779 µs]
                        change: [-59.620% -59.399% -59.192%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) low mild
  4 (4.00%) high mild
  1 (1.00%) high severe

decode_into LKO2:N%2Tw=w]~RBVZRi};RPxuwH/50
                        time:   [76.670 µs 76.892 µs 77.165 µs]
                        change: [-61.079% -60.448% -60.071%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  6 (6.00%) high mild
  1 (1.00%) high severe

decode LKO2:N%2Tw=w]~RBVZRi};RPxuwH/100
                        time:   [311.67 µs 312.63 µs 313.67 µs]
                        change: [-61.709% -60.983% -60.272%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
  1 (1.00%) low severe
  1 (1.00%) low mild
  6 (6.00%) high mild
  3 (3.00%) high severe

decode_into LKO2:N%2Tw=w]~RBVZRi};RPxuwH/100
                        time:   [308.45 µs 311.43 µs 314.53 µs]
                        change: [-60.671% -59.966% -59.062%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  4 (4.00%) high mild
  2 (2.00%) high severe

Benchmarking decode LKO2:N%2Tw=w]~RBVZRi};RPxuwH/200: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.5s, enable flat sampling, or reduce sample count to 60.
decode LKO2:N%2Tw=w]~RBVZRi};RPxuwH/200
                        time:   [1.2453 ms 1.2511 ms 1.2570 ms]
                        change: [-60.492% -60.339% -60.195%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 19 outliers among 100 measurements (19.00%)
  13 (13.00%) low mild
  5 (5.00%) high mild
  1 (1.00%) high severe

Benchmarking decode_into LKO2:N%2Tw=w]~RBVZRi};RPxuwH/200: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.3s, enable flat sampling, or reduce sample count to 60.
decode_into LKO2:N%2Tw=w]~RBVZRi};RPxuwH/200
                        time:   [1.2415 ms 1.2439 ms 1.2464 ms]
                        change: [-59.743% -59.632% -59.525%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  4 (4.00%) high mild

decode LKO2:N%2Tw=w]~RBVZRi};RPxuwH/256
                        time:   [2.0237 ms 2.0368 ms 2.0506 ms]
                        change: [-60.193% -59.879% -59.549%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

decode_into LKO2:N%2Tw=w]~RBVZRi};RPxuwH/256
                        time:   [2.0165 ms 2.0284 ms 2.0412 ms]
                        change: [-60.734% -60.403% -60.061%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  6 (6.00%) high mild
  1 (1.00%) high severe

decode LKO2:N%2Tw=w]~RBVZRi};RPxuwH/500
                        time:   [7.7285 ms 7.7631 ms 7.7992 ms]
                        change: [-60.265% -60.004% -59.757%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  6 (6.00%) high mild
  2 (2.00%) high severe

decode_into LKO2:N%2Tw=w]~RBVZRi};RPxuwH/500
                        time:   [7.7923 ms 7.8177 ms 7.8433 ms]
                        change: [-59.608% -59.445% -59.276%] (p = 0.00 < 0.05)
                        Performance has improved.

decode LKO2:N%2Tw=w]~RBVZRi};RPxuwH/512
                        time:   [8.1224 ms 8.1486 ms 8.1750 ms]
                        change: [-60.953% -60.735% -60.530%] (p = 0.00 < 0.05)
                        Performance has improved.

decode_into LKO2:N%2Tw=w]~RBVZRi};RPxuwH/512
                        time:   [7.9353 ms 7.9662 ms 7.9982 ms]
                        change: [-61.036% -60.846% -60.636%] (p = 0.00 < 0.05)
                        Performance has improved.

     Running benches/encode.rs (target/release/deps/encode-885ad83d48e0fb49)
encode data/SIPI_Jelly_Beans.tiff
                        time:   [697.87 µs 700.42 µs 703.01 µs]
                        change: [-77.324% -77.223% -77.116%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe

encode data/octocat.png time:   [683.95 µs 688.12 µs 692.70 µs]
                        change: [-77.643% -77.402% -77.017%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) high mild
  2 (2.00%) high severe

@rubdos
Copy link
Member Author

rubdos commented Jul 8, 2024

diff --git a/src/util.rs b/src/util.rs
index b79e13c..28e2134 100644
--- a/src/util.rs
+++ b/src/util.rs
@@ -4,12 +4,12 @@ include!(concat!(env!("OUT_DIR"), "/srgb_lookup.rs"));
 pub fn linear_to_srgb(value: f32) -> u8 {
     let v = f32::max(0., f32::min(1., value));
     if v <= 0.003_130_8 {
-        (v * 12.92 * 255. + 0.5).round() as u8
+        (v * 12.92 * 255. + 0.5) as u8
     } else {
         // The original C implementation uses this formula:
         // ((1.055 * f32::powf(v, 1. / 2.4) - 0.055) * 255. + 0.5).round() as u8
         // But we can distribute the latter multiplication, to reduce the number of operations:
-        ((1.055 * 255.) * f32::powf(v, 1. / 2.4) - (0.055 * 255. - 0.5)).round() as u8
+        ((1.055 * 255.) * f32::powf(v, 1. / 2.4) - (0.055 * 255. - 0.5)) as u8
     }
 }
decode_into LGF5]+Yk^6#M@-5c,1J5@[or[Q6./200
                        time:   [1.0087 ms 1.0139 ms 1.0194 ms]
                        change: [-17.801% -15.870% -13.643%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 15 outliers among 100 measurements (15.00%)
  7 (7.00%) high mild
  8 (8.00%) high severe

... let's not.

@rubdos rubdos requested a review from gferon July 8, 2024 12:44
Copy link

@gferon gferon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like you had fun, this is always welcome 🚀

@rubdos rubdos merged commit 85938c4 into main Jul 8, 2024
5 checks passed
@gferon gferon deleted the performance branch July 23, 2024 09:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants